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Abstract 

The Reversible Jump (RJ) algorithm (Green, 1995) is one of the most used 
Markov chain Monte Carlo algorithms for Bayesian estimation and model se- 
lection. We propose a generalized Multiple-try version of this algorithm which 
is based on drawing several proposals at each step and randomly choosing one 



CN) . of them on the basis of weights (selection probabilities) that may be arbitrary 

o 

^ . 

, the possible choices, a method based on selection probabilities depending on 



chosen. Along the same lines as in Pandolfi et al. (2010), we exploit, among 



a quadratic approximation of the posterior distribution. Moreover, we extend 
^ . these results by showing the implementation of an efficient transdimensional 

algorithm for challenging model selection problems. The resulting algorithm 
leads to a gain of efficiency with respect to the RJ algorithm also in terms of 
computational effort. We illustrate the approach by real examples involving a 
logistic regression model and a latent class model. 
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1 Introduction 



Markov chain Monte Carlo (MCMC) methods are increasing their relevance for sta- 
tistical inference, in particular Bayesian inference. In variable dimension problems, 
that mainly arise in the context of Bayesian model selection, a well-known approach 
is the Reversible Jump (RJ) algorithm proposed by Green (1995). The algorithm uses 
the Metropolis Hastings (MH) paradigm (Metropolis et al., 1953; Hastings, 1970) in 
order to generate a reversible Markov chain which jumps between models with dif- 
ferent parameter space dimension. These jumps are achieved by proposing a move 
to a different model, and accepting it with appropriate probability in order to en- 
sure that the chain has the required stationary distribution. The algorithm presents 
some potential drawbacks that limit its applicability. Ideally, the proposed moves 
are designed so that the different models are adequately explored. However, the ef- 
ficient construction of these moves is generally difficult since there is no natural way 
to choose jump proposals (see, among others. Green, 2003). 

Several approaches have been proposed in order to automate the RJ algorithm 
and improve its efficiency. In particular. Green and Mira (2001) exploited the delayed 
rejection method which is based on a modified between-model move, conditional on 
the rejection of an initial trial. Moreover, Brooks et al. (2003) proposed two main 
classes of methods. The first class is based on the efficient parametrization of the 
proposal density, imposing various constraints on the acceptance ratio. The second 
class, instead, generalizes the RJ algorithm by exploiting the so-called saturated space 
approach. The idea is to augment the state space with auxiliary variables (to ensure 
that all models share the same dimension as the largest model) and then to induce 
temporal memory and possible dependency in these variables. This allows the chain 
to have same memory of the states visited in other models, increasing the efficiency 
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of the proposals. Another approach to improve the efficiency of the RJ algorithm 
was proposed by Al-Awadhi et al. (2004), who used a secondary Markov chain to 
modify an initial proposal, allowing it to move from a low probability region of the 
new space towards a mode, before comparing it with the starting state. Fan et al. 
(2009) proposed the use of a convenient marginal path sampling density estimator to 
construct between-model proposal distributions. Finally, the idea of a fully automated 
RJ sampler was considered in Green and Hastie (2009), to which we also refer for a 
detailed review of the main methodological extensions of the RJ algorithm. Another 
interesting approach was proposed by Bartolucci et al. (2006), whose aim was not to 
improve the efficiency of the RJ algorithm but just to make use of its output, in order 
to construct a class of more efficient estimators of the Bayes factor. 

In this paper, we extend the results illustrated in Pandolfi et al. (2010) in which 
a generalization of the Multiple-try Metropolis (MTM) algorithm of Liu et al. (2000) 
is proposed in the context of Bayesian estimation and Bayesian model choice. In 
particular we develop their idea of applying an MTM strategy to improve the RJ 
algorithm in the context of Bayesian model choice, where the dimensionality of the 
parameter space is also part of the model uncertainty. 

In general, the MTM algorithm represents an extension of the MH algorithm 
consisting of proposing, at each step, a certain number of trials and then selecting 
one of them with a suitable probability. The selection probabilities of each proposed 
value are constrained so to attain the detailed balance condition. In particular, Liu 
et al. (2000) propose a rule to choose these probabilities so that they are proportional 
to the product of the target, the proposal, and a A-function which is nonnegative and 
symmetric. These constraints lead to an algorithm that could be not so efficient as 
expected, because it becomes computationally intensive, especially when the number 
of trials is large. 
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The generalization of the MTM scheme proposed by Pandolfi et al. (2010), here- 
after indicated by GMTM, defines the selection probabilities in a more general way. 
Under this approach, minimal constraints are required to attain the detailed balance 
condition. In principle, any mathematical function giving valid probabilities may be 
adopted to select among the proposed trials. In any case, the adopted function is 
crucial for the efficiency in the estimation of target distribution. 

In the Bayesian model choice context, the GMTM extension of the RJ algorithm 
represents a rather natural way to overcome some of the typical problems of this algo- 
rithm. In particular, among the possible ways to compute the selection probabilities, 
we suggest a method based on a quadratic approximation of the target distribution 
that may lead to a considerable gain of efficiency. Moreover, we show that, when it is 
not possible to easily compute this quadratic approximation, the generalized version 
may again lead to an efficient algorithm, provided that the weights may be simply 
computed. 

The paper is structured as follows. In Section 2 we review the MH algorithm 
and the RJ algorithm and we introduce the basic concept of GMTM algorithm for 
Bayesian estimation. In Section 3 we outline the MTM version of the RJ algorithm 
with a discussion on some convenient choices of the selection probabilities. The pro- 
posed approach is illustrated in Section 4 by some applications. Section 5 concludes. 

2 Preliminaries 

We first introduce the basic notation for the MH and RJ algorithms and we briefly 
review the GMTM method. 
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2.1 Metropolis Hastings and Reversible Jump algorithms 

The MH algorithm, proposed by Metropohs et al. (1953) and modified by Hastings 
(1970), is one of the best known MCMC method that can be used to generate a 
random sample from a target distribution, from which direct sampling is difficult. It 
is well known that the basic idea of this algorithm is to construct an ergodic Markov 
chain in the state space of x that has 7r(x) as its stationary distribution. 

A potential problem with the MH method is that the resulting samples are often 
highly correlated. Therefore the estimates resulting from these samples tend to have 
high variance. Moreover, the Metropolis-type local moves often lead to slow con- 
verging algorithms that are easily trapped in a local mode. Another challenge is the 
choice of an efficient proposal step. In fact, it is often the case that a small step-size 
in the proposal transition will result in slow convergence of the corresponding Markov 
chain, whereas a large step-size will result in very low acceptance rate (Liu, 2001). 

In the Bayesian model choice context, the MH algorithm was extended by Green 
(1995) so as to allow the simulation of the posterior distribution on space of varying 
dimensions. It results the RJ algorithm. 

Let {A^i, . . . , A^m} denote the set of available models and, for model J^m, let 
9m be the parameter space, whose elements are denoted by 6m- Also let /(y|m, 6m) 
be the likelihood for an observed sample y, p{6m\^) be the prior distribution of the 
parameters and let p(m) be the prior probability of model /Am- 

In simulating from the target distribution, a sampler must move both within 
and between models. Moreover, the move from the current state of Markov chain 
(m, 6m) to a new state has to be performed so as to ensure that the detailed balance 
condition holds. The solution proposed by Green (1995) is to introduce a set of 
auxiliary variables such that all states of the chain have the same dimension. For 
instance, a jump between models A4i and Aij is achieved supplementing each of the 
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parameter space Bi and Qj with artificial spaces in order to create a bijection between 
them and to impose a dimension matching condition; see also Brooks et al. (2003). In 
particular, let (m, Om) be the current state of Markov chain, where 6m has dimension 
d{Om), the algorithm performs the following steps: 

Step 1: Select a new candidate model Aim' with probability j{m,m'). 

Step 2: Generate u (which can be of lower dimension than 6^) from a specified 
proposal density T{Om, ■)• 

Step 3: Set {0'^,,u') = gm,m'{Gm,u) where gm,m' is a specified invertible function. 
Hence d{Om) + d{u) = d{e'^,) + d{u'). 

Step 4: The acceptance probability of the new model is: 

dgm,m'{Gm, u) 



. fiy\m', O'm') pjO'm' W) p{m') j{m', m) TjO'^,, u') 

a = mm s i 

1 ' f{y\m,dm)p{Om\m)p{m)j{m,m')T{Om,u) 



d{Om,u) 



where the last term is the Jacobian of the transformation from the current value 
of the parameter to the new value. 

The main difficulty in the implementation of the RJ algorithm regards the con- 
struction of an efficient proposal that jumps between models. Inefficient proposal 
mechanisms, could in fact result in Markov chains that are slow in exploring the 
state space and in converging to the stationary distribution. Moreover, the Markov 
chain will have higher autocorrelation and higher asymptotic variance with respect 
to Monte Carlo estimators. Generally, in order to ensure efficient proposal steps, the 
proposed new state should have similar posterior support to the existing state (Green 
and Hastie, 2009). Moreover, pilot run could be convenient to tune different aspects 
of the proposal mechanism. 
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In addition to the RJ algorithm, several alternative MCMC approaches have been 
proposed in Bayesian model and variable selection context. These methods are based 
on the estimation of the posterior probabilities of the available models or on the 
estimation of marginal likelihoods; for a review see Han and Carlin (2000), Dellaportas 
et al. (2002) and Green (2003). 

The RJ algorithm belongs to the first class of methods, which also includes the 
product space search of Carlin and Chib (1995), the Metropolized Carlin and Chib 
method of Dellaportas et al. (2002) and the related composite model space approach 
of Godsill (2001). In particular, the Carlin and Chib (1995) scheme works in a 
product space framework for all possible model parameters and model indicator, so 
as the space has constant dimensionality and a Gibbs sampler can be used to generate 
samples from the posterior distribution. Dellaportas et al. (2002) proposed instead 
a hybrid Gibb/Metropolis strategy in which the model selection step is not based 
on the full conditional, but on a proposal for a move from the current model to 
the new model, followed by acceptance or rejection of this proposal. The composite 
model space method of Godsill (2001) is an RJ algorithm that takes advantage of 
"common" parameters in the context of variable selection. The marginal likelihood 
approach was proposed by Chib (1995) and extended by Chib and Jeliazkov (2001). 
This method is based on the estimation of the marginal likelihood of any available 
model from the output of the MCMC algorithm. Moreover, a powerful method for 
estimating the Bayes factor has been introduced by Meng and Wong (1996) on the 
basis of the bridge sampling identity. 

The RJ algorithm has also been applied to the Bayesian analysis of data from a 
finite mixture distribution with an unknown number of components (Richardson and 
Green, 1997). This approach is based on a series of transdimensional moves (i.e. split- 
combine and birth-death moves) that allow the joint estimation of the parameters and 
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the number of components. In particular, the posterior probabihty of the number of 
components is proportional to the number of visits of the corresponding model. An 
interesting alternative method has been proposed by Stephens (2000) in which a 
continuous time version of the RJ algorithm is proposed for the analysis of finite 
mixture models. This approach consists of performing a birth or a death move with 
a certain probability. The chosen move is always accepted and the time to the next 
move is simulated from a suitable distribution. The posterior probability of a certain 
number of components is then computed as a quantity proportional to the overall 
permanence time in the corresponding model. 

2.2 Generalized Multiple-try method 

The Generalized MTM algorithm (GMTM) introduced by Pandolfi et al. (2010) use 
selection probabilities of the trials proposed proportional to a given weighting function 
w*{y, x) that can be easily computed, so as to increase the number of trials without 
loss of efficiency. 

2.2.1 The algorithm 

Let w*{y,x) be an arbitrary weighting function satisfying w*{y,x) > 0. Let Xt be 
the current state of Markov chain at iteration t. The GMTM algorithm performs the 
following step: 

Step 1: Draw k trials yi, . . . , from a proposal distribution T{xt, ■). 
Step 2: Select a point y from the set {yi, . . . , y/^} with probability given by 

^ w*{y,Xt) 
Y.Uw*{y^,Xt)' 

Step 3: Draw realizations x\, . . . , x\_^ from the distribution T{y, ■) and set a;^ = Xt- 
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Step 4: Define px, = — r. ; r- 

Step 5: The transition from Xf to Xf^i = y is accepted witli probability 



. j 7r{y)T{y,Xt)pxA 

a = mmn, —\. (1) 

[ n[xt)T[xt,y)py] 



Tlie MTM algoritlim of Liu et al. (2000) can be viewed as a special case of the 
GMTM algorithm. In particular: 

1. If w*{yj, Xt) = 'n'{yj)T{yj, xt), the algorithm corresponds to the MTM-I scheme 
with X{yj,xt) = 1. 

'■) 

2. Similarly, if w*(y,j, xA = — - — - — -, the corresponding MTM algorithm is based 

T{xt, yj) 

on X(yj,Xf) = ^T(yj,Xt)T(xt,yj)^ . We term this scheme as MTM-inv. 

3. If vr(7/j) is replaced with its quadratic approximation, the GMTM considered 
in Pandolfi et al. (2010) results. We term this scheme as GMTM-quad. 

Our main interest is to explore situations where the weighting function is easy to 
compute so as to increase the efficiency of the algorithm. 



3 Multiple-try version of the RJ algorithm 

As Pandolfi et al. (2010) already argued, the GMTM strategy may be applied to the 
RJ algorithm with the aim of developing a simultaneous inference on both model and 
parameter space. The Generalized MTM RJ (GMTRJ) algorithm allows us to face 
some of the typical drawbacks of the RJ algorithm, first of all the necessity of an 
accurate tuning of the jump proposals in order to promote mixing among models. 
The extension consists of proposing at each step a fixed number of moves, so as 
to improve the performance of the algorithm and to increase the efficiency from a 
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Bayesian model selection perspective. It is possible to consider this strategy like an 
automatic version of the RJ algorithm, that allows us to choose the more efficient 
jump proposals on the basis of selection probabilities suitably computed. 

3.1 The algorithm 

Suppose the Markov chain currently visits model Aii with parameters 0, and let 
w*{6ap6i) be the weighting function satisfying w*{6aj,6i) > 0. The proposed strat- 
egy (GMTRJ) is based on the following: 

Step 1: Choose a subset of models = {Mg : s G A} for some index set A = 
{ai, . . . , Qk} of size k from which to propose trials. 

Step 2: Draw parameters Oa^, . . . , 0^^. of models A^ai, • • • ,-M.ak, respectively, with 
proposal density T{6i, . . . , T{Oi, OaJ. 

Step 3: Choose 6a from {6a^, . . . , ^a^} with probability given by 



Step 4: Choose a subset of models Ais = {f^t ■ t & B} for some index set B = 
{bi, . . . ,bk} where bk = i. 

Step 5: Draw parameters O^j^, . . . , 6h^_^ of A^^^, . . . , A^fc^_^, with proposal density 



T(6>,^, 6>feJ, . . . ,T(6>,^., 6>fe,_J and set 6>6, = 6,. 



Step 6: Define pg. 



Step 7: Accept the move from 6i to with probability 
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where \J{0i^6aj)\ is the Jacobian of the transformation from the current value 
of the parameters to the new value. 

It is possible to prove that the GMTRJ algorithm satisfies detailed balance condition; 
see Theorem 3.1 in the following section. 

3.2 Prove of detailed balance condition 

As is common in the MCMC approach, the generated chain has to be reversible and 
to satisfied the detailed balance condition, i.e. 

Tx{y)P{y,x) = 7i{x)P{x,y), (2) 

for every {x,y), where P{y,x) is the transition kernel from y to x. This condition 
defines a situation of equilibrium in the Markov chain, namely that the probability of 
being in x and moving to y is the same as the probability of being in y and moving 
back to X (see Robert and Casella (2004) for more details). 

In the following Theorem we demonstrate that the detailed balance condition 
holds in the generalized version of the RJ algorithm. 

Theorem 3.1. The GMTRJ algorithm satisfies detailed balance. 

Proof. The GMTRJ algorithm involves transitions to states of variable dimension, 
and consequently the detailed balance condition is now written as 

7r(6>om,6>aj = 7r{Oa^)p{0a^,e,)\j{e,,ea^)\. 

where, as above, 6i represents the current value of the parameter vector and Oa^ is 
one of the new parameters proposed with j = 1, . . . , k. 
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Suppose that 6i ^ Oaj, noting that ^a^, . . . , ^aj. are exchangeable, it holds that 

= kTT (6/,) T{Oi, 0a,) PO^^ J ...J T{0i, 0a,)... T(6>,, 0a,_,) 



mm 



. T^{0a,)n0a,.0i)Pe, 



T{0a,,,0bi) . . . T{0a,,0hk-i) d0ai • • • d0a,_-, d0b, . . . d0b,_, 
= k J ...jT{0„0a,)...n0„0a,_,) 

min{7r(0,) T{0,,0aj Pe^^, Tr{0a,) T{0a,,0d Pe. \J{0^.0a,)\} 

T{0ak,0bi) . . . T{0ak,0bk-i) d0ai ■ ■ ■ d0ak-i d0b, . . . d0bk_, 
= k7li0ajn0a,,0^)p0^\Ji0^,0a,)\ 
J ...jT{0„0aJ...n0i,0a,_J 

. r 7r{0,)Ti0,,0a,)p0^^ 1 ] 
mm < 1 



7li0a,)T{0a„0^) Pe^J{0^.0a 
T{0ak,0bi) ■ ■ ■ T{0ak,0bk-i) d0ai ■ ■ ■ d0ak^i d0b, . . . d0b,^_. 



= n{0a,) P{0a„0^)\Ji0^,0o 

as required. Note that \ J{0a„0i)\ = 1/| J(0„ 6>,J|. □ 
3.3 Choice of the weighting function 

The choice of the weighting function w*{0ap 0%) is crucial for the efficient construction 
of the jump proposal. In fact, with an useful choice of this weight, it may be possible 
to construct an algorithm that is easy to implement and that allows us to reach a 
good acceptance rate, together with a gain of efficiency. 

At this aim, we consider some special cases of the GMTRJ algorithm: 

1. if w*{0a^, 0i) = Ti{0ah)T{0ahi 0i)i then we have the MTRJ-I scheme that corre- 
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sponds to the MTM-I scheme in Bayesian estimation framework; 



2 




then we have the MTRJ-inv scheme that corre- 
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where 7r*(0a,J is the quadratic approximation of the 



target distribution, then the GMTRJ-quad scheme results; 

4. If it is not possible to simply derive the quadratic approximation of the target 
distribution it is always possible to find a suitable function that allows us to 
simplify the computations. 

In particular, in the GMTRJ-quad algorithm the quadratic approximation of the 
target distribution is given by 



where s(d) and D{0) are, respectively, the first and second derivatives of log7r(0) 
with respect to 0. Then, in the computation of the selection probabilities, after some 
simple algebra, we find an expression that does not require to compute the target 
distribution for each proposed value, saving much computing time. 

An example in which is not feasible to compute this quadratic approximation 
(considered in the fourth case above), is instead the Bayesian model selection of the 
number of unknown classes in a latent class model (example outlined in the following 
section). In this case, the weighting function may be computed as a quantity propor- 
tional to the incomplete likelihood based on the manifest distribution of the observable 
data. The estimation algorithm is, in fact, based on the concept of complete data, 
which consist of the response configuration (incomplete data) and the configuration 
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of the latent variable (or allocation variable) Zi that denotes the subpopulation in 
which the i-th individual belongs to. Thus the incomplete likelihood does not include 
the allocation variable Zi and the weighting function can be easily computed while 
still being the proposal efficient. We term this version as GMTRJ-man. 

4 Empirical illustrations 

We tested our approach in two different examples about Bayesian model selection. 
The first one concerns the selection of the covariates in a logistic regression model, 
the second one is about the choice of the number of components of a latent class 
model. Both of the examples have already been illustrated in Pandolfi et al. (2010), 
in which the former has been analyzed in some detail while the latter has been only 
briefly explained. 

4.1 Logistic regression analysis 

We considered a logistic regression model for the number of survivals, Y, in a sample 
of 79 subjects suffering a certain illness. The patient condition. A, and the received 
treatment, B, are the explanatory factors. See Dellaportas et al. (2002) for details. 
With respect to Pandolfi et al. (2010), we only focused on the Bayesian model choice 
problem, reporting more extended results. 

We considered five possible models: M.i (intercept); M.2 (intercept + A); M.^ 
(intercept + B); Ma (intercept + A + B); (intercept + A + B + A.B). The last 
model, also termed as full model, is formulated as 
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where, for i,j = 1,2, Yij, Uij and pij are, respectively, the number of survivals, 
the total number of patients and the probability of survival for the patients with 
condition i who received treatment j. Let (3 = (/3o, /32, /Ss) = (/^! /^^) /^f ? /^^2^) 
be the parameter vector of the model. As in Dellaportas et al. (2002) we used the 
prior A^(0, 8) for any of these parameters, which by assumption are also a priori 
independent. 

With the aim of testing the performance of the proposed approach, we compared 
the results of the RJ algorithm with those of the MTRJ-I, MTRJ-inv and GMTRJ- 
quad algorithms. The proposal distribution that we used to update the parameters 
and to jump from one model to another is ~ N{f3i,ap). As in Pandolfi et al. 
(2010), we evaluated the effect of the proposal distribution on the performance of 
the algorithms considering several different values of ap (0.1,0.2,0.5,1,1.5,2,2.5). 
Moreover, for the last two algorithms, we considered three different numbers of trials 
{k = 10,50,100). All the Markov chains were initialized from the full model with 
starting point /3 = 0, and their moves were restricted to adjacent models (which 
increases or decreases the model dimension by 1). In the MTRJ-I, MTRJ-inv and 
GMTRJ algorithms, the MTM strategy is only applied in drawing the parameter 
values. For all the algorithms, every sweep consists of a move aimed at updating the 
parameters of the current model (performed through the GMTM-quad algorithm), 
and of a transdimensional move aimed at jumping from a model to another. Finally 
each Markov chain ran for 1,000,000 iterations discarding the first 50,000 as burn-in. 

The output summaries are illustrated in Table 1 for ap = 0.5 and a number of 
trials k = 10; all of the approaches gave similar results. Figure 1 illustrates the 
evolution of the ergodic probabilities for the two most probable models (A^2 and 
A^4) in the first 60,000 iterations. We considered two different values of ap (0.5,2) 
and, for the MTRJ and GMTRJ-quad algorithms, a number of trials k equal to 10. 
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Table 1: Posterior model probabilities for the logistic example with dp = 0.5 and 
k = 10 



Model RJ MTRJ-I MTRJ-inv GMTRJ-quad 



Mi = fi 0.0050 0.0049 0.0049 0.0049 

M2 = fi + f^t 0.4889 0.4922 0.4930 0.4928 

M3 = fi + fif 0.0111 0.0111 0.0113 0.0112 

= At + /i^ + /if 0.4425 0.4398 0.4391 0.4380 

= fi + fif + fif + fif-^ 0.0524 0.0519 0.0516 0.0521 



We can see that the RJ algorithm has slower convergence than MTRJ-I, MTRJ-inv 
and GMTRJ-quad algorithms, especially when the proposal is not adequate {ap = 2). 

The efficiency of the algorithms is measured on the basis of the ratio R = a'^/a'^, 
where cx^ is the Monte Carlo variance of the mean estimator and cr^ is the asymptotic 
variance of the same estimator based on the draws generated by the algorithm of 
interest. In particular, is computed on the basis of the autocorrelation between 
these draws and takes into account the permanence in the same model. Limited to 
the case of /c = 50 trials, and to the RJ, MTRJ-inv and GMTRJ-quad algorithms 
(we do not report the results of the MTRJ-I algorithm since they are quite similar to 
those of the MTRJ-inv algorithm), the results of the above comparison are reported 
in Figure 2, which shows how the value of the index R (corrected for the computing 
time) behaves as ap increases. Table 2 summarizes these results for all the values of k 
considered and for three different values of ap (0.1, 0.5, 2). In Figure 3 we also report 
the efficiency ratios corrected again for the computing time. 

We observe that, for most of values of ap there is a consistent gain of efficiency 
of the MTRJ-inv and GMTRJ-quad algorithms with respect to RJ algorithm. This 
gain of efficiency corresponds to an increase of the acceptance rate, which is higher 
in both the MTRJ-inv and the GMTRJ-quad algorithms with respect to the RJ 
algorithm (see Table 3 which shows the acceptance rate for all the values of ap and 
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Figure 1: Ergodic posterior model probability of models A^2 and Ai^ for the logistic 
regression example: (a) Model A^2 , CTp = 0.5, k = 10, (b) Model Ai2, CTp = 2, k = 10, 
(c) Model Mi, ap = 0.5, k = 10, (d) Model M4, ap = 2, k = 10 

k considered). It is also worth noting that, the GMTRJ-quad algorithm leads to an 
acceptance rate which is slightly lower than that of the MTRJ-inv algorithm, but this 
reduction is more than compensated by the saved computing time due to the use of 
the quadratic approximation. Overall, we can see that the proposed GMTRJ-quad 
algorithm outperforms the other two algorithms, when the computing time is properly 
taken into account. 
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Table 2: Values (divided by 1000 and adjusted for the computing time) of R for the 
logistic example 
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1.53 








(Tp 




2 


2.30 


12.40 


5.45 


10.81 


22.74 








CTp 




n 1 
U.i 


7.32 


19.37 


11.50 


15.67 


4.71 






MTRJ-inv 


CTp 




n c; 

U.J3 


2.02 


2.86 


2.58 


2.38 


1.82 








CTp 




o 

z 


2.10 


5.01 


3.32 


4.58 


7.34 


k = 


50 


























CTp 




0.1 


5.99 


13.13 


9.93 


10.22 


4.05 






GMTRJ-quad 


CTp 




0.5 


1.59 


2.56 


2.02 


2.11 


1.57 








CTp 




2 


1.70 


4.70 


2.81 


3.80 


6.87 








CTp 




0.1 


8.72 


23.66 


13.86 


18.94 


6.44 






MTRJ-inv 


CTp 




0.5 


2.62 


3.57 


3.41 


3.01 


2.42 








CTp 




2 


2.83 


5.18 


3.71 


4.40 


6.09 


k = 


100 


























CTp 




0.1 


6.45 


16.81 


10.17 


13.41 


4.30 






GMTRJ-quad 


CTp 




0.5 


1.80 


3.04 


2.40 


2.49 


1.90 








CTp 




2 


1.94 


4.03 


2.72 


3.30 


4.50 



Table 3: Acceptance rate for the logistic example 







CTp 

0.1 0.2 0.5 1 1.5 2 2.5 




RJ 


9.24 14.07 10.84 3.56 1.46 0.72 0.43 


A; = 10 


MTRJ-inv 
GMTRJ-quad 


11.53 21.82 33.84 20.42 10.83 6.13 3.82 
11.44 21.72 31.19 19.01 10.24 5.86 3.68 


A; = 50 


MTRJ-inv 
GMTRJ -quad 


12.89 26.27 40.71 35.81 26.28 18.31 12.82 
12.97 25.91 35.52 31.29 23.65 16.74 11.99 


k = 100 


MTRJ-inv 
GMTRJ-quad 


13.60 27.98 41.53 39.35 32.69 25.15 18.95 
13.56 27.48 35.96 33.85 28.79 22.62 17.31 
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Figure 2: Values of the index R as increases with /c = 50 (computing time taken 
into account): (a) RJ, (b) MTRJ-inv, (c) GMTRJ-quad 



4.2 Latent class analysis 

We considered the same latent class model and the same data considered by Goodman 
(1974), which concern the responses to four dichotomous items of a sample of 216 
respondents. These items were about the personal feeling toward four situations 
of role conflict; then there are four response variables collected in the vector Y = 

(Fi,F2,i^3,n). 

Parameters of the model are the class weights vr^ and the conditional probabilities 
of success Xj\c, where c = 1, . . . , C, with C denoting the number of unknown classes 
and j = 1, ... ,4. On the basis of these parameters, the probability of a response 
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Figure 3: Efficiency ratio as (Xp increases with A; = 50 (computing time taken into 
account): (a) RJ versus MTRJ-inv, (b) RJ versus GMTRJ-quad. 



configuration y, with elements is given by 



c 



c=l j=l 



A priori, we assumed a Dirichlet distribution for the parameter vector (tti, . . . , ttc) ^ 
D{5, . . . , 5), with 5 = 1, and independent Beta distributions for the parameters \j\c. 
We set both the parameter of the Beta distribution equal to 1. Finally, for C we 
assumed a uniform distribution between 1 and Cmax, where Cmax is the maximum 
number of classes we assumed a priori. 

The objective of the analysis is the inference about the number of classes (C), 
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and the parameters Xj\c and tTc- At this aim we exploited the approach of Richardson 
and Green (1997), who apphed the RJ algorithm on the analysis of finite mixtures of 
normal densities with unknown number of components. On the basis of this approach, 
we performed a RJ strategy where the moves are restricted to models with one more or 
one less component. We associated to each subject in the sample an allocation variable 
Zi equal to c when subject i belongs to latent class c. The a priori distribution of 
each Zi depends on the class weights vTc; see also Cappe et al. (2003). This strategy 
is called completing the sample, and (2, y) is referred to as the complete data. Under 
this formulation the complete data likelihood has logarithm 

z y 

where 6 is the vector of all model parameters arranged in a suitable way, rrizy are 
the frequencies of the contingency table in which the subjects are cross-classified 
according to latent configuration z and response configuration y, and f{z,y) is the 
manifest distribution 

f{z,y)=7r,llXl{l-X,\^)^'-y^\ 

j 

We considered two different pair of dimension- changing moves: split-combine and 
birth-death which are performed with probability equal to 0.5 respectively. At every 
iteration, split-combine or birth-death moves are preceded by a Gibbs move aimed 
at updating the parameters of the current model, sampling from the full conditional 
distribution. We tackled the label switching problem by post-processing the output of 
the algorithm: we used relabeling-invariant priors and ran the MCMC unconstrained. 
Then the MCMC output was sorted on the basis of the class weights. 

Suppose that the current state of the chain is (C, 6c) ■ In the split-combine move, 
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we first make a random choice between attempting to split or combine with probabihty 
0.5. Of course, if C = 1 we always propose a split move and if C = Cmax we always 
propose a combine move. The split proposal consists of choosing a class c* at random 
and split into two new ones, labeled ci and C2- The corresponding parameters are 
split as follows: 

1. TTci = """c* ^ ^ ^-iid Tic2 = T^c" X (1 — n) with u ~ Be{a, (3); 

2. Aj|ci ~ 5e(r x Xj\c*,T x (1 - \j\c*)) and Aj|c2 ~ Be{r x Aj|c*, r x (1 - \j\c*)), for 
j = 1, . . . , 4, where r is a given constant that has to be tuned in order to reach 
an adequate acceptance rate. 

When the split move is accomplished, it remains only to propose the reallocation of 
those observations with Zi = c* between ci and C2. The allocation is done on the basis 
of probabilities computed analogously to the Gibbs allocation. 

In the reverse combine move a pair of classes (01,02) is picked at random and 
merged into a new one, c*, as follows: 

2. Aj|c. ~ Be{T X A„, r X (1 - A^)), with A„ = (Aji^i + Aj|cJ/2 for j = 1, . . . , 4. 

The reallocation of the observations with Zi = 0\ or Zi = 02 is done by setting Zi = 0*. 

The split move is accepted with probability min{l. A} whereas the combine move 
is accepted with probability min{l,yl~^} where A can be computed as 



r{y\Oc+i)p{0c^MC+l) ^ {C + l)\ Pe(C + l)/[(C + l)C/2] 

X ; X ' ; — ; X 



f*iy\0c)piOc)piC) C\ Ps{C)/C Palloc 

_ ny\0k+i)p{Ok+i) Pc{c + i) nj^?(A,icO I I 

(3) 
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where f*{y\Oc) is the complete data hkehhood not in logarithmic form, and g 
denotes the Beta density. Ps{C)/C and Pc(C + 1)/[(C + l)C/2] are respectively the 
probabilities to split a specific class out of C available ones and to combine one of 
[C + l)C/2 possible pairs of classes. The factorials and the coefficient 2 arise from 
combinatorial reasoning related to label switching; Paiioc is the probability that this 
particular allocation is made and \Jspiit\ is the Jacobian of the transformation from 
6c to dc+i, which is equal to vTc* . 

In the birth-death move we first propose a birth or a death move along the same 
lines as above; then a birth is accomplished by generating a new empty class, i.e. a 
class to which no observation is allocated, denoted by c*. To do this we draw tTc* from 
Be{l, C), where C is the current number of classes, and rescale the existing weights, 
so that they sum to 1, as tt^ = 7rc(l — tTc*). The new parameters Xj\c* for j = 1, . . . , 4 
are drawn from their prior distribution. 

For the death move, a random choice is made between the empty classes; the 
chosen class is deleted and the remaining class weights are rescaled to sum to 1. The 
allocation of the Zi is unaltered because the class deleted is empty. 

The use of the prior distribution in proposing the new values for Aj|c* leads to 
a simplification of the resulting acceptance probability, min{l,74}, that, after some 
calculation, reduces to 



Here, the first term is the prior ratio, and the remaining terms contain the proposal 
ratio; ■) is the Beta function, Cq is the number of empty classes and the likelihood 
ratio is 1. The Jacobian is computed as \Jbirth\ = (1 ~ tTc*)*""^. The death move is 



accepted with probability min{l, A ^}. 
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In this application, we compared the standard RJ algorithm with two different 
versions of the MTRJ-inv algorithm. In the first version, named MTRJ-inv-I, the 
MTM strategy consists of proposing, at each step, a fixed number k of different classes 
to split or combine or to add or delete; then the corresponding parameters are drawn 
from their respective proposal distributions. In the second version, named MTRJ-inv- 
II, we choose a class to split or combine (or to add or delete) and the MTM strategy 
is only applied in drawing the parameter values. Moreover, we applied the GMTRJ- 
man algorithm to the data, in which the selection probabilities are computed as a 
value proportional to the incomplete likelihood. In fact, this is a situation in which 
the quadratic approximation of the target distribution can not be easily computed. 
The incomplete likelihood does not include the allocation variables Zi, which have 
not to be reallocated for each proposed trial. This allows us to save much computing 
time. As in the MTRJ-inv algorithm, we tested two versions of the GMTRJ-man 
(GMTRJ-man-I and GMTRJ-man-II) using the same criterion as above. 

We ran each Markov chain for 2,000,000 sweeps following a burn-in of 400,000 
iterations; moreover, we set Cmax = 20, a = P = 2 and r = 10. For both the ver- 
sions of the MTRJ-inv and GMTRJ-man algorithms, we also considered two different 
number of trials {k = 5, 10). 

Table 4 shows the estimated posterior probabilities of each class for all the algo- 
rithms, using k = 10. We can see that all the algorithms give quite similar posterior 
probabilities of the number of classes; the two most probable models are the model 
with two classes and the model with three classes. Table 5 illustrates the proportion 
of moves accepted. The plot of the first 40,000 values of C after the burn-in is given 
in Figure 4 while, in order to check the stationarity. Figure 5 illustrates the plot of the 
cumulative occupancy fractions for different values of C (2,3,4,5) against the number 
of sweeps, for the first 1,000,000 iterations. In both the Figures we considered the 
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MTRJ-inv-II and GMTRJ-man-II algorithms with a number of trials k = 10. 



Table 4: Posterior distribution of C for the latent class example with A; = 10 



c 


RJ 


MTRJ-inv-I 


MTRJ- 


•inv-II 


GMTRJ- 


-man-I 


GMTRJ- 


man-II 


1 


0.000 


0.000 




0.000 




0.000 




0.000 


2 


0.210 


0.211 




0.211 




0.213 




0.213 


3 


0.218 


0.217 




0.213 




0.215 




0.225 


4 


0.174 


0.180 




0.182 




0.178 




0.176 


5 


0.131 


0.130 




0.134 




0.131 




0.131 


6 


0.092 


0.094 




0.093 




0.091 




0.090 


7 


0.063 


0.063 




0.061 




0.060 




0.060 


8 


0.040 


0.040 




0.040 




0.040 




0.039 


9 


0.026 


0.025 




0.025 




0.026 






10 


0.017 


0.016 




0.016 




0.017 




U.UiO 


on 


0.029 


0.025 




0.024 




0.029 




u.uzo 


Table 5: 


Acceptance rate for the latent class example with k = 10 




Move 


RJ 


MTRJ-inv-I 


MTRJ- 


•inv-II 


GMTRJ- 


•man-I 


GMTRJ- 


man- 1 1 


split 


1.99 


5.89 




7.24 




5.84 




4.03 


combine 


1.98 


5.87 




7.23 




5.83 




4.04 


birth 


5.34 


11.11 




11.00 




8.88 




8.64 


death 


5.35 


11.08 




10.95 




8.87 




8.63 



Finally, in Table 6 are reported the values of the autocorrelation of the chain, for 
the most probable models (with a number of classes between 2 and 7), resulting in 
the five algorithms. We report the values corrected for the computing time. 

From the results of the comparison, we can see that the MTRJ-inv and GMTRJ- 
man algorithms have a higher acceptance rate with respect to the RJ algorithm, low- 
ering the autocorrelation of the chain. In particular, the GMTRJ-man-II algorithm, 
with k = 10, outperforms the other algorithms in terms of autocorrelation. 

Moreover, although all the algorithms mix well over C, with few excursions in 
very high values of C, the mixing of the MTRJ-inv and GMTRJ-man algorithms 
is better than of the RJ algorithm (Figure 4). From Figure 5, we can see that for 
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40000 



Figure 4: The number of latent classes in the first 40,000 iteration after the burn-in: 
(a) RJ, (b) MTRJ-inv-II with k = 10, (c) GMTRJ-man-II with k = 10 



Table 6: Values (divided by 1,000 and adjusted for the computing time) of the auto- 
correlation of the chain for the latent class example 







Number of classes 
2 3 4 5 6 7 




RJ 


727.81 257.66 163.67 145.60 164.42 163.13 


k = 5 


MTRJ-inv-I 
MTRJ-inv-II 
GMTRJ-man-I 
GMTRJ-man-II 


773.52 313.52 192.27 179.17 181.78 190.16 
714.76 278.56 160.61 157.50 157.04 172.46 
597.01 225.61 130.92 122.17 138.89 148.46 
498.98 215.98 121.50 118.53 132.22 136.81 


k=10 


MTRJ-inv-I 
MTRJ-inv-II 
GMTRJ 
GMTRJ-II 


944.16 443.38 264.57 232.33 292.87 301.68 
908.65 340.09 192.69 180.90 204.55 222.76 
590.2 247.27 163.21 141.94 150.91 180.54 
407.41 196.37 112.51 115.73 119.12 136.28 
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Figure 5: Occupancy fraction for different values of C: (a) RJ, (b) MTRJ-inv-II with 
k = 10, (c) GMTRJ-man-II with k = 10 



all the algorithms the burn-in is more than adequate to achieve the stability in the 
occupancy fraction, but the proposed algorithms can achieve the stability in a shorter 
period. 



5 Conclusions 

In this paper we presented an extension of the RJ algorithm (GMTRJ algorithm) 
that allow us to explore the different models in a more efficient way, automating the 
choice of the jump proposal. The idea is to exploit the MTM paradigm in order to 
propose a fixed number of transdimensional moves, and then select one of them on the 
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basis of suitable selection probabilities. This probabilities are computed on the basis 
of a weighting function that could be arbitrary chosen. The choice of this function is 
crucial for the effectiveness of the algorithm. 

We illustrated several special cases of the algorithm coming out from this choice. 
Some of these algorithms may be seen as the corresponding versions, in Bayesian 
model choice context, of the MTM algorithm introduced by Liu et al. (2000) for 
Bayesian estimation problems; we termed these algorithms as MTRJ-I and MTRJ- 
inv. We also introduced alternative versions of the GMTRJ algorithm that could 
be useful in different model selection problems. The first version replaces the target 
distribution in the weighting function with its quadratic approximation so that the 
resulting algorithm, that we termed as GMTRJ-quad, is more effective that the stan- 
dard RJ algorithm, without being more computationally intensive. We also demon- 
strated that, when for variable selection problem the computation of the quadratic 
approximation is not feasible, it is still possible to derive useful weighting functions 
that lead to an efficient algorithm. 

We illustrated the potentialities of this approach by two real examples. The first 
was about the selection of covariates of a logistic regression model. For this case we 
compared the performance of the RJ algorithm with the performance of the MTRJ-I, 
MTRJ-inv, and GMTRJ-quad algorithms. We showed how, for different values of the 
parameters of the proposal distribution, the proposed algorithms always outperform 
the RJ algorithm. In particular, we have a lower autocorrelation of the chain and a 
higher acceptance rate. Moreover, the quadratic approximation allows us to obtain 
a gain of efficiency with respect to the other algorithms, when the computing time is 
properly taken into account. 

The second example was about the estimation of the number of latent classes in 
a latent class model. This is an example in which the computation of the quadratic 
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approximation of the target distribution is not easy to derive. We therefore proposed 
to use the incomplete hkehhood as weighting function; this choice allows us to save 
much computing time without loss of efficiency. The resulting version of the GMTRJ 
algorithm was named GMTRJ-man. The results obtained from applying the proposed 
approach to the latent class example confirm that, with the generalized MTM version 
of the RJ algorithm, is possible to increase the number of trials without increasing 
the computational effort, so as to better explore the state space. 

Future research will be devoted to explore different types of the weighting function 
and to better evaluate how this choice affects the efficiency of the resulting algorithm. 
The aim is to automatize as much as possible the construction of the transdimensional 
moves. 
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